Data collection methods:
Administrative and spatial, observational, interviews, surveys

PSCI 2270 - Week 7

Georgiy Syunyaev

Department of Political Science, Vanderbilt University

October 17, 2024

Plan for this week



  1. 3-Page Proposal and OSF

  2. Data collection methods

  3. Discussion of two papers

3-Page Proposal and OSF

3-Page Proposal




  • Due: Next Thursday (October 24) before class

  • Aim: Brief description of your project that follows final write-up structure

  • Submission: To OSF (add me as collaborator) and post link on Brightspace

Proposal structure


  1. Background and Literature Review: 1-2 paragraphs

  2. Research Question and Theory: 1-2 paragraphs

  3. Setting/Context: 1 paragraph

  4. Independent Variables: 1 paragraph

  5. Dependent Variables (Outcomes of Interest): 1 paragraph

  6. Measurement: 1-2 paragraphs

  7. Possible Issues: 1-3 paragraphs

  • Final project includes more details on everything + estimation procedures with R code if applicable

Reviewing Literature



  • Why is it important to study your dependent variable? E.g. what are its general effects? Or are there normative reasons?

  • What do we know about the causes of your dependent variable?

  • What methods and data scholars use to study your dependent variable?

  • Which particular cases (countries, regions, social media, etc.) do they study?

Useful Online Resources for Scientific Literature



General Search Engines

  • Google Scholar: Comprehensive and easy to use; covers a wide range of disciplines, but may include non-peer-reviewed content and sometimes lacks full-text access.
  • Vanderbilt Library: Less comprehensive than Google Scholar, but includes more options for full-text access.

Academic Databases

  • JSTOR: High-quality, peer-reviewed journals and books; strong in humanities and social sciences, but limited access to recent publications and requires subscription for full access.

  • ScienceDirect: Extensive collection of scientific and technical research articles, but primarily focused on science and technology and requires subscription for full access.

  • Web of Science: Comprehensive citation database; useful for citation analysis, but requires institutional access and has a complex interface.

Advanced Tools

  • Semantic Scholar: AI-powered search engine that provides relevant literature and citation analysis, but coverage may not be as extensive as other databases and is primarily focused on scientific literature.

  • Research Rabbit: Helps discover related papers and visualize connections between research topics, but is a newer tool with evolving features and may require learning curve.

  • Elicit: AI tool for literature review that helps in finding and summarizing research papers, but is limited to specific research questions and may not cover all disciplines.

Reading Yourself

Section Content
Abstract Short summary-make sure you understand this!
Introduction 1. The questions the paper will try to answer
2. Why it’s important to know those answers
3. A summary of what the answers are and how they were found
Theory 1. The outcome variable (thing to be explained or measured)
2. The independent variables (things that explain outcome)
3. Hypotheses about measure of or effects on outcome
Data/Methods 1. How and what data is collected
2. How variables are measured using this data
3. Technique(s)/Method(s) used to test the hypotheses
Results 1. Do estimated relationships correspond with hypotheses?
2. Statistical and substantive significance of estimates
3. Checks of alternative explanations
Conclusion Broader implications for the field of study
Appendix/Replication archive Usually online: all details needed to verify the procedures and results and possibly to replicate

Let’s Look over the example

Unable to display PDF file. Download instead.

Open Science Framework



  • Resource for storing all materials related to your study (except data/replication materials)

  • Each project is stored in a repository with version control (?)

  • One of the largest storages for pre-analysis plans for projects

  • Let’s go ahead and create a project: osf.io

Data Collection Methods

Data Collection Methods



  • Document analysis: Use of any audio, visual, or written materials as a source of data

  • Interview data: Data that are collected from responses to questions posed by the researcher to a respondent

  • Firsthand observation: Data that may be collected by making observations in a field study or in a laboratory setting

Everything is possible (!)

  • Question: How can we study factors that affect protest participation?

    • Case study: Lohmann (1994)
    • Process-tracing: Pearlman (2013)
    • Laboratory experiment: Young (2019)
    • Social network analysis: Larson et al. (2019)
    • Using original surveys: Boulianne and Sangwon Lee (2022)
    • Field experiments: Bursztyn et al. (2021)
    • Original data on protests: Steinert-Threlkeld (2017)

Protests in space

Geolocation and spatial data


  • “Geographic Information Systems (GIS) in International Relations” by Jordan Branch

Geographic Information Systems (GIS) are being applied with increasing frequency, and with increasing sophistication, in international relations and in political science more generally. Their benefits have been impressive: analyses that simply would not have been possible without GIS are now being completed, and the spatial component of international politics—long considered central but rarely incorporated analytically—has been given new emphasis. However, new methods face new challenges, and to apply GIS successfully, two specific issues need to be addressed: measurement validity and selection bias. Both relate to the challenge of conceptualizing nonspatial phenomena with the spatial tools of GIS. Significant measurement error can occur when the concepts that are coded as spatial variables are not, in fact, validly measured by the default data structure of GIS, and selection bias can arise when GIS systematically excludes certain types of units. Because these potential problems are hidden by the technical details of the method, GIS data sets and analyses can sometimes appear to overcome these challenges when, in fact, they fail to do so. Once these issues come to light, however, potential solutions become apparent—including some in existing applications in international relations and in other fields.

What is GIS?



  • Geographic Information System (GIS): Any system for the collection and analysis of data that are coded spatially (by location). Generally involves the use of a GIS software package for the creation and analysis of spatial data.
  • Vector data: Points, lines, and polygons to describe spatial features: a point for a feature at a single location, a line for a linear feature such as a road, or a polygon for a feature that covers a definable spatial area.

  • Raster data: Pixels, predefined equivalent-sized units that are then assigned a value for a single variable across the entire area covered by the data.

Raster data (Peisakhin and Rozenas 2018)


Vector data (Korovkin and Makarin 2023)


Examples


  • Stasavage, David. 2011. States of Credit: Size, Power, and the Development of European Polities. Princeton, NJ: Princeton University Press.

    • Shape or scale of polity/country matters
  • Starr, Harvey. 2013. On Geopolitics: Space, Place, and International Relations. Boulder, CO: Paradigm.

    • Shared borders and interaction across them
  • Cederman, Lars-Erik, Kristian Skrede Gleditsch, and Halvard Buhaug. 2013. Inequality, Grievances, and Civil War. New York: Cambridge University Press.

    • Location and size of ethnic or rebel groups
  • What else?

Issues



  • Measurement validity:

    • Changes in boundaries (and their meaning) over time
    • Changes in meaning of boundaries across space
  • Selection bias:

    • Exclusion of units based on inability to code them geographically

Discussion

What kind of geolocation or spatial data can we use to study determinants of protest activity?

Protests in text

Text as data


  • “Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges” by John Wilkerson and Andreu Casas

Text has always been an important data source in political science. What has changed in recent years is the feasibility of investigating large amounts of text quantitatively. The internet provides political scientists with more data than their mentors could have imagined, and the research community is providing accessible text analysis software packages, along with training and support. As a result, text-as-data research is becoming mainstream in political science. Scholars are tapping new data sources, they are employing more diverse methods, and they are becoming critical consumers of findings based on those methods. In this article, we first describe the four stages of a typical text-as-data project. We then review recent political science applications and explore one important methodological challenge—topic model instability—in greater detail.

Four stages of text analysis



  1. Obtaining text: web-scraping vs crowdsourcing
  1. From text to data: supervised vs unsupervised methods (or LLMs nowadays?)
  1. Quantitative analysis of text: analysis of counts vs predictions with machine learning
  1. Evaluating performance: gold standard vs out-of-sample validation

Four uses of text data



  • Classification: Unsupervised machine learning methods compare the similarity of documents based on co-occurring features

  • Scaling: Use texts to locate political actors on ideological space

  • Text Reuse: Explicitly value word sequencing in judging document similarity

  • Natural Language Processing: Moving from “whom?” to “who did what to whom?”

Issues



  • Measurement reliability:

    • Unsupervised Machine Learning produces different results
    • Also is an issue with Large Language Models (e.g. ChatGPT)
  • Measurement validity:

    • Unsupervised Machine Learning and Large Language Models are black boxes
  • Selection bias:

    • We often do not have access to the full universe of documents (e.g. API restrictions)

Discussion

What kind of text data can we use to study determinants of protest activity?

Protests in the field

Field research


  • “Field research” by Elisabeth Wood

This article addresses “why leave the office” questions, primarily through a discussion of exemplary works that draw on field research. The first section focuses on James Scott’s Weapons of the Weak, which is a classic work of field research in comparative politics. It then turns to some recent works that explore several related topics, using a combination of surveys, participant observation, and interviews. Other field research methods, trends toward natural and field experiments, and combinations of field methods and non-field methods are also discussed. The last portion of the article concentrates on the challenges that field researcher’s encounter, irrespective of the particular method they use.

Types of field research



  • Participant observation

  • In-depth interviews

  • Survey

  • Experimentation

Examples


  • Scott, James. 1987. “Weapons of the weak: everyday forms of peasant resistance”

    • Two years in a Malaysian village in the Muda region of north-western Malaysia.
  • Posner, Daniel. 2004. “The political salience of cultural di erence: why Chewas and Tumbukas are allies in Zambia and adversaries in Malawi.”

    • Two ethnically distinct groups on the Malawian and Zambian sides of the border.
  • Chattopadhyay, R., and Duflo, E. 2004. “Women as policy makers: evidence from a randomized policy experiment in India.”

    • Explore random gender quotas assignment in local councils in India.

Examples



  • Wantchekon, Leonard. 2003. “Clientelism and voting behavior: evidence from a Weld experiment in Benin.”

    • Randomly (!) assign party appeals during presidential elections in Benin.
  • Weinstein, Jeremy. 2006. “Inside rebellion: the politics of insurgent violence.”

    • Interviews and shadowing of four rebel groups across different contexts.

Issues



  • Personal biases:

    • Are we too influenced by our theories? How do we abstract from this?
  • Selection bias:

    • How does researcher choose the cases?
  • Ethics:

    • How do we protect study participants?
    • How do we compensate?
    • How do we handle lies?

Discussion

What kind of field research can we use to study determinants of protest activity?

References


Boulianne, Shelley, and Sangwon Lee. 2022. “Conspiracy Beliefs, Misinformation, Social Media Platforms, and Protest Participation.” Media and Communication 10 (4). https://doi.org/10.17645/mac.v10i4.5667.
Bursztyn, Leonardo, Davide Cantoni, David Y Yang, Noam Yuchtman, and Y Jane Zhang. 2021. “Persistent Political Engagement: Social Interactions and the Dynamics of Protest Movements.” American Economic Review: Insights 3 (2): 233–50.
Korovkin, Vasily, and Alexey Makarin. 2023. “Conflict and Intergroup Trade: Evidence from the 2014 Russia-Ukraine Crisis.” American Economic Review 113 (1): 34–70. https://doi.org/10.1257/aer.20191701.
Larson, Jennifer M., Jonathan Nagler, Jonathan Ronen, and Joshua A. Tucker. 2019. “Social Networks and Protest Participation: Evidence from 130 Million Twitter Users.” American Journal of Political Science 63 (3): 690–705. https://doi.org/10.1111/ajps.12436.
Lohmann, Susanne. 1994. “The Dynamics of Informational Cascades: The Monday Demonstrations in Leipzig, East Germany, 198991.” World Politics 47 (1): 42–101. https://doi.org/10.2307/2950679.
Pearlman, Wendy. 2013. “Emotions and the Microfoundations of the Arab Uprisings.” Perspectives on Politics 11 (2): 387–409. https://doi.org/10.1017/s1537592713001072.
Peisakhin, Leonid, and Arturas Rozenas. 2018. “Electoral Effects of Biased Media: Russian Television in Ukraine.” American Journal of Political Science 62 (3): 535550.
Steinert-Threlkeld, Zachary C. 2017. “Spontaneous Collective Action: Peripheral Mobilization During the Arab Spring.” American Political Science Review 111 (2): 379–403. https://doi.org/10.1017/s0003055416000769.
Young, Lauren E. 2019. “The Psychology of State Repression: Fear and Dissent Decisions in Zimbabwe.” American Political Science Review 113 (1): 140–55. https://doi.org/10.1017/S000305541800076X.